Data science ethics: Algorithmic bias + Data privacy

Lecture 12

Dr. Mine Çetinkaya-Rundel

Duke University
STA 199 - Fall 2022

October 6, 2022

Warm up

While you wait for class to begin…

Open your ae-09 project in RStudio, render your document, and commit and push.

Announcements

Algorithmic bias

Garbage in, garbage out

  • In statistical modeling and inference we talk about “garbage in, garbage out” – if you don’t have good (random, representative) data, results of your analysis will not be reliable or generalizable.
  • Corollary: Bias in, bias out.

Google translate

What might be the reason for Google’s gendered translation? How do ethics play into this situation?

ae-09

  • Go to the course GitHub org and find your ae-09 (repo name will be suffixed with your GitHub name).
  • Clone the repo in your container, open the Quarto document in the repo, and follow along and complete the exercises.
  • Work on Part 1 - Stochastic Parrots
  • Render, commit, and push your edits by the AE deadline – 3 days from today.

Machine Bias

2016 ProPublica article on algorithm used for rating a defendant’s risk of future crime:

In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.

  • The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.

  • White defendants were mislabeled as low risk more often than black defendants.

Risk score errors

What is common among the defendants who were assigned a high/low risk score for reoffending?

Risk scores

How can an algorithm that doesn’t use race as input data be racist?

ae-09

  • Go to the course GitHub org and find your ae-09 (repo name will be suffixed with your GitHub name).
  • Clone the repo in your container, open the Quarto document in the repo, and follow along and complete the exercises.
  • Work on Part 2 - Predicting ethnicity
  • Render, commit, and push your edits by the AE deadline – 3 days from today.

Data privacy

What does Google think/know about you?

Have you ever thought about why you’re seeing an ad on Google? Google it! Try to figure out if you have ad personalization on and how your ads are personalized.

countdown(minutes = 5)
05:00

Privacy of your data

What pieces of data have you left on the internet today? Think through everything you’ve logged into, clicked on, checked in, either actively or automatically, that might be tracking you. Do you know where that data is stored? Who it can be accessed by? Whether it’s shared with others?

More…

  • Mention webscraping
  • Mention data science ethics oath
  • Reproducibility? Good that they’re already learning… so much more to it than what we cover in class…